Executing Parallel Programs with Synchronization Bottlenecks Efficiently
نویسندگان
چکیده
We propose a scheme within which parallel programs with potential synchronization bottlenecks run e ciently. In the straightforward implementations which use basic locking schemes, the execution time for the program parts with bottlenecks increases signi cantly when the number of processors increases. Our scheme makes the parallel performance for the bottleneck parts of programs close to the sequential performance while maintaining the e ciency with which the nonbottleneck parts run. Experiments with a 64-processor SMP and a 128-processor DSM machine conrmed that parallel programs implemented with our scheme perform much better than parallel programs implemented with other widely-used locking schemes.
منابع مشابه
Message Passing and Threads
In this chapter we examine two fundamental, although low-level, approaches to expressing parallelism in programs. Over the years, numerous different approaches to designing and implementing parallel programs have been developed (e.g., see the excellent survey article by Skillicorn and Talia [870]). However, over time, two dominant alternatives have emerged: message passing and multithreading. T...
متن کاملExecuting Communication-Intensive Irregular Programs Efficiently
We consider the problem of eÆciently executing completely irregular, communication-intensive parallel programs. Completely irregular programs are those whose number of parallel threads as well as the amount of computation performed in each thread vary during execution. Our programs run on MIMD computers with some form of space-slicing (partitioning) and time-slicing (scheduling) support. A hard...
متن کاملArchitectural and Software Support for Executing Numerical Applications on High Performance Computers By
Numerical applications require large amounts of computing power. Although shared memory multiprocessors provide a cost-e ective platform for parallel execution of numerical programs, parallel processing has not delivered the expected performance on these machines. There are two crucial steps in parallel execution of numerical applications: (1) e ective parallelization of an application and (2) ...
متن کاملMultigranular Thread Support in WaveScalar
WaveScalar is a recently proposed scalable microarchitecture. The original WaveScalar research developed and evaluated an ISA and microarchitecture that efficiently executes a single, coarse-grain thread. In this paper, we expand that design to support multiple, simultaneously executing threads. Four mechanisms make this possible: (1) instructions that enable and disable wave-ordered memory; (2...
متن کاملSurvey of optimizing techniques for parallel programs running on computer clusters
In the current field of high performance computing, clusters technologies plays an ever increasing role. This paper tries to summarize state-of-the techniques for optimization of parallel programs designed to run on computer clusters. Optimizing parallel programs is a much harder task than optimizing sequential programs due to the increased complexity caused be communication and synchronization...
متن کامل